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ABSTRACT 

With the growing number of projects dedicated to the search for extrasolar planets via 
transits, there is a need to develop fast, automatic, robust methods with a statistical 
background in order to efficiently do the analysis. We propose a modified analysis 
of variance (AoV) test particularly suitable for the detection of planetary transits 
in stellar light curves. We show how savings of labor by a factor of over 10 could 
be achieved by the careful organization of computations. Basing on solid analytical 
statistical formulation, we discuss performance of our and other methods for different 
signal-to-noise and number of observations. 

Key words: Methods: data analysis, Methods: statistical, Techniques: photometric, 
Surveys, (Stars:) planetary systems, (Stars:) oscillations (including pulsations) 
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1 INTRODUCTION 

The search for extrasola r planets via transits has a venera- 
ble history dStruvei l952'l. Ho wever, it is the det e ction of the 
transits of HD209458b by ICharbonneau et alJ feuudh and 
, the results from the OGLE survey (Udalski et al., 2002), 
1 that have given a very strong boost to this field, and in 
the last few years m ore than 20 ground based experiments 
! started (lHornel2003l) . Although it is very simple in principle 
to do a search for extra solar planets via transits, the small 
number of positive detections shows that most of the differ- 
ent projects were over optimistic in their initial estimates. 
Independently of the problem of providing reliable photom- 
etry on a large enough number of epochs, problems with the 
objects selection and false alarms eme rged prominently as 
shown by the radia l velocity follow up llKonacki et alj|2003t 
iBouchv et aill2005l) . 

Further comm ents on this issue are found in 
lAlonso et al] i2003ft and in other proceedings of the confer- 
ences "Saentffic_F^oir^raof Research on Extrasolar Plan- 
ets" ijDemine fc Seae e3 l2QQj ) and "Extr asolar planets to- 
day and tomorrow" ^Beaulieu et al]l2005h . From a statisti- 
cal point of view, the search for transits poses a mere special 
case of period search for small signal-to-noise (S/N) ratio 
with a known signal shape with short duty cycle spanning 
over a small fraction of the phase. On one hand, the space 
based observations of COROT & Kepler of relatively few 
targets, are plannin g the use of advanced and complex sta- 
tistical procedures llDefav et aljEoOll : ICarpano et aflEoOot 
I.Tenkins and Dovlel ;2003 and references therein). The space 
surveys differ from the ground ones discussed here both in 
terms of the statistics (very long uninterrupted observations 



with no atmospheric scintillation) and different underlying 
physics (e.g. planetary reflected light with long duty cy- 
cle, c.f. I.Tenkins and DovlejEoOSl l . On the other hand in the 
ground surveys data are noisy and interrupted with peri- 
odic gaps. Transits occur with rather short duty cycle and 
periods comparable to the gaps period. Several approaches 
to transit detec tion in ground da ta w ere already tested in 
practice, e.g. bv lDovle et ail feOOOt and lWeldrake fc Sacked 
(2005). A low success ratio of the ground surveys calls for 
massive searches with not much object pre-selection. This 
adds motivation for the development of robust and fast new 
methods. 

The present paper is devoted to modification of the 
analysis of variance (AoV) periodogram (Schwarzenberg- 
Czerny, 1989, Paper I) for the specific purpose of plane- 
tary transit search (AoVtr) (Sect. |HJ and to the discussion 
of related issues of statistical (Sect. and numerical ef- 
ficiency (Sect. El . The properties of AoV related methods 
known sin ce the classical work of iFisheil lll94lf) (see also 
lFiszlll963ll are reviewed in Papers I and by Schwarzenberg- 
Czerny (1999, Paper II). The AoV periodogram proved to 
be an efficient tool in space and ground bas ed photomet- 



ric su rveys of s tellar variability by Hipparcosjjvan^emrwen 



| l997ft: OGLE dUdalski et al] 119941) : EROS feeaulieu et ah 
ll993Beaulieu et alll997l) . For applications in planet search 
see iCunmu^i^t^uT^99T) . 

Detailed analytical results for the AoV transit method 
are discussed in Sect. [I] We note that in contrast, no analyt- 
ical results are available for other methods. Tingley (2003a) 
reviewed the performances of methods suitable for plane- 
tary transits by resorting to Monte Carlo simulations, and 
the result proved to be mixed success. On one hand, in the 
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original work the results were distorted by non-optimal im- 
plementation of one method. This kind of problem was diffi- 
cult to spot because of inherent lack of internal consistency 
checks in Monte Carlo simulations. On the other hand, re- 
visitation of the problem with the revised implementation 
(Tingley, 2003b) yielded the final result constituting a mere 
numerical illustration of the general result reached in Paper 
II. From a numerical point of view, all the methods dis- 
cussed by Tingley (2003a) suffer from a drawback that they 
demand repeated calculations for each phase of transit, an 
increase of workload by a factor of several dozens. Hence 
potential advantage of such phase-independent method as 
AoVtr introduced here. 



2 ON ADVANTAGE IN THE PERIODOGRAM 
SEARCH FOR VARIABILITY 

A good comparison of the efficiency of the periodogram and 
the period independent variability search methods is illus- 
trated by the ordinary variance and fast fourier transform 
(FFT) discrete power spectrum (DPS), for N = NF ob- 
servations and frequencies. For a pure noise input, the suit- 
ably normalized power spectrum has X 2 (2) distribution with 
expected value and standard deviation (s.d.) of 2 and y/2, 
respectively. For the variance we have X 2 (^0> N and y/N. 
Let us add to the input signal such a sine oscillation that 
its frequency power in DPS increases by 2, i.e. by 1.4 s.d. 
other frequencies remaining unaffected. By the virtue of Par- 
seval's theorem the variance is proportional to the sum of 
PDF, hence its corresponding increase is by 2/y/N s.d. For 
a large NF = N this change in variance becomes entirely 
insignificant while the corresponding change at the specific 
frequency of DPS is significant. Qualitatively this remains 
true for comparison of other frequency independent and de- 
pendent searches and for general uneven sampling. 

For existing surveys extending over NS ~ 10 7 stars 
observed N ~ 10 3 times over several years, NF ~ 10 4 would 
be required for proper frequency sampling. For phase folding 
and binning the number of operations scales as 0(NS x N x 
NF) ~ 10 14 operations. This circumstance prompted us to 
search a way of increasing the efficiency of transit searching 
methods in the present article. 

At first glance the standard FFT demanding 0(NS x 
log 2 NF x NF) operations appears to be an attractive al- 
gorithm. Let NH denotes rounded ratio of the orbital pe- 
riod and transit width. The short duty cycle of transits, of 
the order of l/NH, where NH ~ 20 - 100 is short. It is 
well known that to reach an optimum sensitivity, the resolu- 
tion of the model function sho uld match the incoming signal 
iSchwarzenberg-Czernvll999l) . Thus an efficient detection of 
a transit with FFT requires NH — 20— 100 harmonics, mak- 
ing the total number of frequencies NH x NF ~ 10 6 . Then 
the log 2 NF x NH and N factors become not so widely dif- 
ferent, but the FFT noise would come not from one but 
from NH = 20 — 100 frequencies. However, for contigous 
data and smooth light curves of pulsatin g stars the FFT re- 
lated algorithms are methods of choice iPress and Rvbickj 
1989). 



3 METHOD DERIVATION 

There is no shortage of publications devoted to th e interpre- 
tation of phase folded and binned data (see e.g. lFiszlll963l 
and paper I for the theory and applications, respectively). 
The underlying principle is to assume the null hypothesis, 
Ho, that the data are fitted by a constant value and then to 
test it against the alternative hypothesis Hi (NH) employing 
the phase binned light curve, corresponding to a step func- 
tion. Here we adopt only two phase bins of unequal width: 
in and out of a transit. Let us assume that width of transit 
is v in phase units. Then binning corresponds to fitting the 
following step function: 



s(x) = 



for < x < v 
otherwise 



where 
a 



< x eT > 



(1) 



(2) 



and N and N s t, < x > and < x^t > denote the number 
of observations in total and in the transit and their corre- 
sponding average values. Next we assume that the mean was 
subtracted from the data, so that the current mean value 
vanishes =< x >= N e ra + (N — 7V 6 t)&, hence 



N 



N-N F 



(3) 



The design of the analysis of v ariance test for transits 
(AoVt r) b ecomes a spe cial case of ISchwarzenberg-Czernvl 
)l989ft and lDaviesl ( 1990), in the case where there are only 2 
bins. Following notation of paper II, the AoVtr periodogram 
statistics O is defined in terms of sums of squares of model 



and observations, denoted respectively \\xit || and 



Nor- 



mally these sums are referred as x 2 statistics. We prefer vec- 
tor norm notation where Fisher lemma reduces to Pythago- 
ras theorem. The corresponding AoV statistics becomes: 



e 



l IblP-ll 



where 



||a;|| || 2 = N eT a 2 + (N - N eT )b 2 = J^f a 2 

IV - iVcT 



(4) 
(5) 



and 1 and Nu = 2 degrees of freedom account for subtrac- 
tion of the average, < x >= 0, and for one parameter of the 
model, a. The above procedure easily extends onto weighted 
observations. In this case in Eq. and JSJ one should re- 
place N and N^t with the corresponding sums of weights. 
Elsewhere N remains as the number of degrees of freedom. 
Additionally, < x >, < x e r > and ||x|| 2 should be weighted 
sums. 

The implementation of the method is simple. At the 
beginning we proceed with binning of observations into NH 
even phase bins. Next, we select the bin with the lowest 
average as transit and ignore the remaining ones. We exploit 
here an often forgotten property, known at least from the 
times of lWhittaker and Robinson I (ll926Tl : the labor at phase 
folding and binning may be reduced at least by half, by 
calculation of bin averages and not their variances. For the 
selected bin we calculate a and from Eq. ||5J and Q and 
use no sum of squares except for ||x|| 2 and N calculated 
once for all. From the actually observed value of and the 
Fischer-Snedecor F(, ; ) cumulative distribution P one finds 
the tail probability 
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Q = NH[P{F(l,N-Nf,@)}-P{oo}] (6) 

as an estimates of the detection significance. The NH factor 
in IH1 accounts for the selection of the transit bin among NH 
bins in total. 



4 METHOD PERFORMANCE 
4.1 Test power criterion 

In order to evaluate the statistical performance of our AoV 
method for transits (AoVtr) we apply the test power formal- 
ism of paper II. The higher the test power 1 — 0, the more 
sensitive is a given method, where 



a priori knowledge of the shape of the signal. The signal can 
be modeled by a top hat, and the simpler is the underlying 
model, in terms of its parameter count, the more powerful is 
the test (Eq.|7|l. The ordinary AoV model has N^ AoV = NH 
parameters (bin averages) and the present one, AoVtr, has 
just N« AoVtr = 2 parameters. For matching bin and tran- 
sit width both models fit the light curve the same way, 
1. Thus for the same signal to noise 



2 

I AoV 



ratio A 2 the AoV and AoVtr reach similar sensitivity for 

, i.e. for 



= N/JN\ 



detection as long as N/ y/N^ 

■J 30/2 less observations for AoVtr. Further examples con- 
cerning of application of the test power formalism for design 
of experiments are provided in Paper II. 



R-\l 
where 

r- q 



A 2 N- 



+ ■ 



E 



P{F(l,N-N f ,Q)} 



(7) 



(8) 



1 — a denotes the significance level, A 2 the signal-to-noise 
ratio in power units, N and N« = 2 are numbers of observa- 
tions and parameters of the model, E, V and P denote mean, 
variance and cumulative distribution of the Fisher-Snedecor 
F(1,N — Nu) distribution, respectively. For practical pur- 
poses, it is convenient to replace F distribution in Eq. Q 
with the Fisher Z = (1/2) log F distribution yielding the 
sam e informati on. The latter has near gaussian distribution 
re.g. lFisdll963h . 

The actual transit form is assumed to be rectangular 
of width v. The model consists of two top hat functions of 
width c and 1 — c, where 1 corresponds to the period length. 
The corresponding signal shape factor ||s|||| 2 is derived in 
Appendix lAl fEq . I A 71 . Note, that for fixed significance level 
1 — a, the sensitivity of our method remains unchanged as 
long as the expression yl 2 7V|| s | || 2 /^/2A| remains constant. 

We adopted a synthetic spectrum of a KOV star (T e // = 
5250, log g = 4.5, log [M/H] = 0.0, iw, = 2km/ s) from 
Claret (2000) and computed synthetic transits over a wide 
range of filters (from U to K), and various inclination angles 
of the system. In this very broad range, we derived from Eq. 
IA7I values 1 > ||s|||| > 0.95. This demonstrates the small 
effect of rectangular approximation on detection efficiency. 
On the other hand Eq. (IA9I and lAlU demonstrate that 
using model transits of width different by factor 2 from the 
actual one, yields ||sii || 2 w 1/2 causing appreciable loss of the 
detection efficiency, corresponding to the use of only half of 
the total number of data N. At this price one gains factor 
of several computation boost by avoiding detailed fit of the 
transit width. Our result opens possibility for making in- 
formed compromise between speed and statistical efficiency. 
Moreover, given the observing strategy of a given transit 
survey, it is possible to estimate in advance the range of 
periods of transiting planets that will be probed, and there- 
fore choose an adapted value of NH. Moreover, it is also 
perfectly possible to do the analysis in two passes, one with 
NH = 15 and a second one with NH = 30, it will still be 
much more efficient than the currently used methods. 

We stress that this new scheme is better than the stan- 
dard binning scheme adopted for variable star searches based 
on AoV : in the particular case of a transit search, we have an 



4.2 Comparison with other methods 

Paper II employed the test power concept as a general for- 
malism for evaluation of performance of period search meth- 
ods. In particular it demonstrated, that sensitivity for de- 
tection depends on the used signal model and not on the 
particular choice of statistics/periodogram. According to pa- 
per II the sensitivity for detection apart from S/N depends 
on the match of a light curve and its model implemented 
in the search method. However, in this respect any differ- 
ence between the realistic and top hat models of transits is 
small fSect. |4~T1 . No difference in performance is expected 
among different methods employing the same model for the 
transit light curve if applied to the same data. Another fun- 
damental fact in statistics is that optimum method should 
involve as few model parameters as absolutely necessary to 
decrease residuals (e.g. Paper II). For this reason sophisti- 
cated, multi-parameter models yield poor sensitivity. 

In statistical terms our method is best compa red with 

the 

r 



matched filter method (MF) as modified by iTjnglevI 
23) (mMF). Note that Eqs. (3) and (4) of iTinglevI 



(2003bJ) are proportional to our Eqs. @ and 0. The differ- 
ence is we account properly in Eq. @ for the variance deter- 
mined from the same data and yield analytical distribution 
in return. This should produce no appreciable difference in 
statistical performanc e of two metho ds ( mMF fc AoVtr) . 
We refer the reader to lTinglevI j2003al) and lTinglevI j2003bT) 
fo r discussion of mathe matical similarity of the BLS method 
of iKovacs et alJ i2002T) and MF discussed here. 

It remains to demonstrate the relation between the cross 
correlation (CCF) and sum of squares (x 2 ) statistics: 



2 

X = 



2(2;, Z||) + 



(9) 



The last term above is non-random (a constant) , the first one 
is sum of many terms, therefore random with small varia- 
tion. Both are independent of frequency. The term with the 
dominant variance is the middle one as it reduces to the 
sum of few terms in transit. This term corresponds to the 
CCF. So, except for sign and (nearly) constant shift the 
distributions of \ 2 an d CCF are identical and yield iden- 
tical statistical conclusions. This applies in general to the 
MF approach as pursued bv lWeldrake fc Sackettl J2005D and 
iJenkins and Dovld |:2003). Note that in the latter case the 
employed model, xu is different from the transit one. The 
best known CCF-like statistics is power spectrum, consist- 
ing of sum of squared nor ms of sine and cosine CCF func- 
tions. In time series context fLombl dl976l) first demonstrated 
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for power spectrum statistical equivalence of CCF and x 2 
statistics. 

Comparison with lDovle et all (|200(tl who employed the 
MF method with the sum of absolute value of residuals 
as the test statistic is more difficult. For gaussian errors 
the quadratic norm used in AoVtr and its CCF equiva- 
lent in some MF methods has optimum properties (e.g. 
lEadie et al.lll97ll) . However, for certain distributions with 
large or no moments the absolute value statistics is known 
to pe rform better than the quadratic one. In such applica- 
tions |DoxJ^^^J] ll2000l) method may be better than those 
discussed so far. 

On one hand performance of the bayesian method of 
iDefav et all (1200 ll) in tests by iTinglevI d2003bl) was poor. 
On the other hand Schwarzenberg-Czerny (1998) invoked 
Wold theorem to demonstrate that in the asymptotic limit 
of large data set performance of bayesian methods should 
match that of the classical one s, for similar setup. This re - 
sult is also in consistency with lGreeorv and Loredol il992t) . 
In this respect poor performance could arise e.g. from imple- 
mentation inconsistency and/or from small data behaviour. 

4.3 Realistic application 

We applied the AoVtr method to the publicly available 142 
OGLE light curves of periodic transit candidates (Udalski et 
al., 2002). With NH = 30 for 139 light curves we detect the 
signal with the same principal period always with O > 15. 
For the remaining 3 the period claimed originally appeared 
as an alias. 



5 EFFICIENT ALGORITHM 
IMPLEMENTATION 

For as large gaps as encountered in astronomical observa- 
tions from the ground the FFT related methods discussed 
in Sect.|5|suffer from large overhead for processing null data 
in the gaps. The fastest known methods for observations 
with large gaps rely on phase folding and binning of data 
as in the case of AoV and AoVtr methods. In the simplest 
implementation for each frequency one calculates phases of 
observations and assigns into respective phase bins. For each 
observation falling in a given bin, the weight and weighted 
sum of this bin are incremented. This is the most labor con- 
suming part. A known drawback of the phase binning is 
the loss of the detection effic iency for the eclipses /transit s 
falling at the bin boundary ( Sch warzenberg-Czernvl ll999'). 
An efficient protection against that is to bin the observa - 
tions starting at different initial phases (IStellingwerdll978l) . 
Each such bin set is called a coverage. The trick preventing 
repeated summing of observations for each coverage is to 
first bin the observations into sub-bins. Than the whole-bin 
sums for each coverage are obtained at the end by summing 
the corresponding sub-bins, with negligible overhead. 

Simple fixed bin size implementation of phase folding is 
prone to occasional failure, both in the numerical and statis- 
tical sense, due to incomplete phase coverage. But we found 
that a robust yet statistically correct solution for the treat- 
ment of these seldom occurring cases is to sort observations 
in phase and then to bin them evenly according to their se- 
quence number. Now, provided that for all phases equal 



the sort routine preserves the original order intact, our code 
would work also for frequency. The periodogram value at 
frequency is nothing else but a frequency independent AoV 
variability test. 

Our sample code implementation in C taking care of 
both procedures is presented in Table IC1I We stress that 
the floor operation should be implemented by simple reg- 
ister shifts, back and forth, with null filling. Then in the 
innermost loop there remain only 4 other floating point op- 
erations per star, observation and frequency. For two cover- 
ages this yields a factor over 4 labor saving compared to the 
calculation of variances for each bin and for each coverage 
in separate. 

Note, that the same sub-binning concept enables the 
implementation of the AoVtr method with variable width 
of transits. This requires an extra piece of code checking 
whether inclusion of additional neighbor sub-bins increases 
or decreases detection significance. To facilitate that, one 
should store a pre-computed table of critical F values for 
(l,n),n — 1,N degrees of freedom. Note that values for 
large n are going to be rarely used and may be omitted. 



6 OPTIMUM FREQUENCY SAMPLING 

The proper selection of the frequencies to be scanned poses 
a challenge for efficient design of the analysis. Nominally 
adapted sampling in period search corresponds to such 
a step in frequency 5v p , that over the whole interval of 
observation At p any essential light curve feature remains 
marginally in phase. For a sine curve that corresponds to 
AtpSi/p < 1. However, for transits and other short duty 
cycle processes the condition becomes even more stringent 
At p Si/ p < l/NH, and therefore the sampling in frequency 
would have to be even more dense. 

Instead we propose a two-tier approach bound at sav- 
ing a factor of several computing effort. The aim of the first 
scan of data is to detect with high sensitivity any periodic 
variability, with no guarantee of proper period identification. 
At this stage we take full advantage of otherwise annoying 
presence of aliases. Thus we strive at detection of any of the 
aliases as an indication of the presence of a periodic signal. 
Due to their even spacing, the detection of aliases remains 
efficient even for severe under sampling. This occurs due to 
a vern ier effect as described bv lSchwarzenberg-Czernv et alJ 
( 2005). Except for the pathological case of commensurability 
of steps, one of loosely and evenly spaced sampling frequen- 
cies ought to coincide with one of the evenly spaced aliases. 
This ensures detection but yields no reliable period value. 
Yet at this stage are rejected all constant stars, reducing our 
sample by a factor up to 100. At the next stage it remains 
to recalculate with proper sampling the periodogram for the 
variable stars detected at stage one. The difficult task of se- 
lecting among several aliases remains, but one gets to that 
point by a less exhausting way. 

To explain the role of aliases and the vernier princi- 
ple let us remind of simple facts about effect of sampling 
on Fourier transform and its norm, DPS. The final discrete 
sampling pattern w may be approximated by product of 
three functions w = d x s x p representing a discrete pattern 
of individual exposures, seasonal pattern and the total du- 
ration of the observing campaign. The corresponding time 
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scales Std < At a < At p , are minutes to days, several month 
and several years, respectively. In the frequency space they 
correspond to the total Nyquist range. This range is cov- 
ered by evenly spaced alias peaks for any single real period 
and width of an individual peak, Avd > 5v s > 8v p , where 
Ai/d — 1/Std and so on for s and p. For simplicity we con- 
centrate here on the most relevant alias pattern due to sea- 
sons. Each consecutive sampling point becomes shifted by a 
constant step with respect to the aliases. The case of com- 
mensurable steps of aliases and sampling constitutes a rare 
pathology. Since aliases are separated by troughs, there are 
about Su s /2Sup — At p /2At a aliases associated with each 
single oscillation. In order to detect any of them, suffices if 
say 3 or 5 of the sampling frequencies cover the whole pat- 
tern of width 8v s , i.e. if the periodogram is sampled with 
the step 8u v = 1/A8t s . This constitutes a gain by a factor 
Svv/Sfp — Atp/AAts corresponding numerically to duration 
of a project, in units of a year. 

In practical terms the suitable frequency step 8u v is best 
selected by trial and error on a survey sub-sample. A suitable 
undersampling step choice should not produce any large loss 
of detections compared to the case of proper sampling by 
Sup. 



7 NON-GAUSSIAN ERRORS 

The referee raised important issue of possible non-gaussian 
distribution of errors (the parent distribution). There is no 
space here for a thorough discussion of this problem, but 
some points are worth of mention. Applicability of the F 
statistics in the AoV method depends on the sum of squares 
in the numerator and denominator of Eq. (3) obeying the \ 2 
distribution. In this respect more critical is the numerator 
||x|||| 2 (Eq.|KJ. For gaussian parent distribution ||a;|||| 2 obeys 
the X 2 (l) distribution by virtue of the Fisher lemma, where 
1 = ATii — 1. If non-gaussian errors satisfy assumptions of 
the Central Limit Theorem, then in the asymptotic limit of a 
large number of transit observations, N e r — + oo, the average 

< £gT > obeys the gaussian distribution and its square in 
Eq. JSJ obeys X 2 (l)> as required. The relevant assumption 
of the Central Limit Theorem is existence of bounds on the 
moments of the parent distribution. This is not as restrictive 
as it may appear. Usually observations are pre-screened so 
that their errors have limited magnitude hence all moments 
of the distribution exist. 

The real issue is whether the number of observations 
per bin, i.e. here per transit, N e r, is sufficient to approach 
the asymptotic limit. From the proof of the Central Limit 
Theorem follows that the relevant merit figures for symmet- 
ric and asymmetric parent distributions are 1^3/ (31 i/N^t) 
and /^4/(4!JV s t), respectively, where fu denotes i-th cen- 
tral moment of the parent distributio n in units o f i-th 
power of its standard deviation a (e.g. lBrandtlll97ff) . Ex- 
cept for pathological parent distributions with large high 
moments, fii S> a 1 , smallness of the merit figures indi- 
cates approaching of the asymptotic limit. Let us consider 
a particular case of strongly asymmetric parent distribution 
X 2 (2), i.e. the e~ x distribution with ~ a\ The aver age 

< x eT > obeys the x 2 (2iVeT) distribution. iFished (|l92.4 l ar- 
gued that for AgT > 30 the asymptotic limit is good enough. 
His conclusion is subject to the restriction that the signif- 



icance level does not exceed the usual range of 0.999. In 
order to get Net = 30 observations in transits, one needs 
N ~ 30NH — 1000 observations in total, on average. Thus 
our analytical theory should apply directly for a number 
of existing transit surveys with TV > 1000 observations per 
candidate object. 



8 CONCLUSIONS 

We presented arguments for the adoption of a new AoV 
related test particularly suitable for detection of planetary 
transits in stellar light curves. As it is based on just one 
parameter fit its statistical test power is bound to exceed 
that of common variability tests. We demonstrated how by 
careful organization of computations savings of labor by a 
factor of over 10 may be achieved. 
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APPENDIX A: PROJECTION OF SIGNAL 
ONTO MODEL SPACE 

For the test signal we shall use the function s from Eq. Q 
with constants a and b redefined so that < s >= (s, 1) = 
and ||s|| 2 = (s, s) = 1. However, in the present consideration 
scalar products involving sums over observations should be 
replaced with integrals of function product over the entire 
range of phases: (/,<?) = J f(ip)g(ip)dtp. In such a case we 
obtain 



(Al) 



(A2) 



The norm of the signal s projected onto its model func- 
| 2 has to be calculated following the prescription 



tion, 



from paper II: 
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i=i 



< s, 



6 (0 > f 



(A3) 



We assume that the test signal and transit model are rect- 
angular of different width, v and c, respectively. The or- 
thonormal model functions functions (jy- > (x) for AoVtr, cor- 
responding to in/out transit phases are adopted as: 



i 





for < x < c 
otherwise 



(A4) 



<t> {2 \x) 







for c < x < 1 
otherwise 



(A5) 
(A6) 

Substituting these definitions into Eq. 1A3L for v > c one 
obtains 



(1 -v)c 



(A7) 



1 (l-c)v 

For v < c one should swap v and c in the above equation. 
The following particular results are of interest here: 

n - 



< 1 

i|2 ■ r c v \ t 

sii — > mini— ,— ) for v, c 

V c 



1 for v — > c 

NH -2 
2{NH - 1) 



(A8) 

(A9) 
(A10) 

for v = 2/NH, c=l/NH(AU) 



APPENDIX B: LIST OF SYMBOLS 

A - signal-to-noise amplitude ratio; 

a - parameter of the model (brightness in transit); 

b - parameter of the model (brightness out of transit); 

c - transit width in the model signal, c = l/NH; 

E - the expected value of a distribution; 

FFT - fast fourier transform; 

F(Ni, N2\-) - the Fisher-Snedecor probability density dis- 
tribution for iVi and A2 degrees of freedom, abbreviated as 
F(JVi,JV a ); 

Ho - the statistical null hypothesis stating that a signal 
consists of pure noise; 

Hq(NH) - the statistical alternative hypothesis stating 
that a signal consists of noise plus deterministic component 
(e.g. transit of width l/NH); 

i — p,s,d,v sampling indices - correspond to (p) 
proper sampling/full span of observations, (s) sesonal 
span/frequency pattern (1 yr), (d) day span/frequency 
pattern and (v) the vernier sampling pattern; 

N - total number of observations; 

NF - number of frequencies in the periodogram; 

NH - Period length in units of transit width, also optimum 
number of bins for ordinary no top-hat AoV method; 

NS - number of stars observed in a whole survey; 

iVgT - number of observations in transit; 

iVy = Nu AoVtr - number of parameters of the top-hat 
model, TVii = 2; 



\\AoV 



number of parameters of the phase binned; model 



N UoV = NH; 

P - the cumulative probability distribution; 

R(0, 1; •) - a normalized probability distribution, such that 
E{R} = and V{-R} = 1, e.g. normalized F, normalized 
Z = (1/2) log F or normal distribution; 

S/N - signal-to-noise power ratio, i.e. ratio of squared am- 
plitudes S/N = A 2 ; 

s - the normalized actual deterministic signal, < s >= 
and ||s|| 2 = 1; 

|| s || || 2 - squared normalized projection of the actual deter- 
ministic signal s onto the assumed top hat model, x«, cor- 
responding to the squared cosine between vectors s and x\\, 
hence IIsm || 2 = (s ™ ,J : " "~ 



7{IMI 



-'}; 



T - integration variable for O F statistics; 
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V - the variance value of a distribution; 
v - transit width in the actual signal; 
x - values of observations, < x >= 0; 

xu - values of model light curve, < xu >= 0, in terms of 

the orthogonal components x\\(t) = ^2i=i( x \\j 0^)0^ 

i 6 t - values of observations in transit, < x^t >= a; 

5vi - where i — p,s,d,v, the frequency i.e. periodogram 
sampling interval, 8ut = 1/AU; 

/S.Vi - where i = p,s, d, v, the total frequency span of a 
feature or of the periodogram; 

Sti - where i — p,s, d, v, the time i.e. observation sampling 
interval; 

Avi - where i — p,s, d, v, the time span of a feature or the 
observations; 

<fr (i) - normalized top hat functions covering transit (l = 
1) and out of transit (I — 2) bins; 

X 2 (M;-) - the x 2 probability density distribution for M 
degrees of freedom, abbreviated as x 2 (M); 

- the observed value of F statistics; 

(•,•) - scalar product (possibly weighted); 

< ■ > - average value of argument, < x >= (1, x); 

|| ■ || 2 - quadratic norm of argument, i.e. (possibly 
weighted) sum of squares |]:r|| 2 = (a;, a;); 



APPENDIX C: SOURCE CODE 

Sample implementation of the transit periodogram in C code 
is presented in Table IUT1 
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Table CI. Sample C implementation of the transit periodogram. 



int aov (int nobs, TIME tin[], FLOAT fin[], int nh, int ncov, 
LONG nfr, TIME frO, TIME frs, FLOAT * th) 

{ 

/* (C) by Alex Schwarzenberg-Czerny, 2003, 2005 */ 
int net [MAXBIN],ind[MAXOBS], i, ibin, ip, nbc; 
LONG ifr, iflex; 

FLOAT f[MAXOBS], ph[MAXOBS], avc[MAXBIN], af, vf, sav; 
TIME t [MAXOBS], fr, at, dbc, dph; 

if (((nh+l)*ncov > MAXBIN) || (nobs > MAXOBS) || 
(nobs <= nh+nh)) 

{ 

/* fprintf(stdcrr," AOV: error: wrong size of arrays/n"); */ 
return(-l); 

}; 

nbc = nh * ncov; 
dbc = (TIME) nbc; 

/* calculate totals and normalize variables */ 

iflex = 0; at = (TIME) (af = vf = (FLOAT)O.); 

for (i = 0; i < nobs; i++) { af += fin[i]; at += tin[i]; } 

af /= (FLOAT) nobs; at /= (TIME) nobs; 

for (i = 0; i < nobs; H — h) 

{ 

t[i] = tin[i] - at; 

f[i] = (sav = fin[i] - af); 

vf += sav*sav; 

}; 

/* assumed: sum(f[])=0, sum(f[]*f[])=vf and sum(t[]) is small */ 

for (ifr = 0; ifr < nfr; ifr+- 1-) /* Loop over frequencies */ 

{ 

fr = ((TIME) ifr) * frs + frO; 
for (ip = 0; ip < 2 ; ip++) 
{ 

for (i = 0; i< nbc; i++) { avc[i] = 0.; nct[i] = 0; }; 
if ( ip == 0) /* Try default fixed bins ... */ 



for (i = 0; i < nobs; i++) /* MOST LABOR HERE */ 
{ 

dph=t[i]*fr; /* TIME dph, t, fr */ 

ph [i] = (sav= (FLOAT) (dph-floor (dph) ) ) ; 

ibin= (int)floor (sav*dbc) ; 

avc[ibin] += f[i]; 

++nct[ibin]; 

} 



else /* ... and elastic bins, if necesseary */ 
{ 

H — hiflex; /* sort index ind using key ph */ 
sortx(nobs,ph,ind); /* corrected NR indexx would do */ 
for (i = 0; i < nobs; i++) 
{ 

ibin=i*nbc/nobs; 
avc[ibin] += f[ind[i]]; 
++nct[ibin]; 

}; 

}; 

/* counts: sub-bins=>bins */ 

for (i=0;i<ncov;H — h) nct[i+nbc]=nct[i]; 
ibin=0; 

for(i=ncov+nbc-l;i>=0;i— ) nct[i]=(ibin+=nct[i]); 

for (i=0;i<nbc;H — h) nct[i]-=nct[i+ncov]; 

for (i = 0; i < nbc ; iH — h) /* check bin occupation */ 

if (nct[i] < CTMIN) break; 
if (i>=nbc) break; 

}; 

/* data: sub-bins=>bins */ 

for (i=0; i<ncov; i++) avc[i+nbc]=ave[i]; 

sav=0.; 

for (i=ncov+nbc-l;i>=0;i-) ave[i]=(sav+=ave[i]); 
for (i=0;i<nbc;H — h) ave[i]-=ave[i+ncov]; 



/* AoV statistics for transits */ 
sav=avc [0] / net [0] ; 
for (i=0;i<nbc;i++) 

if((ave[i]/=nct[i])>=sav) {sav=ave[i];ibin=i;}; 
sav* =sav*nct [ibin] *nobs / (FLOAT) (nobs- net [ibin] ) ; 
th[ifr] = sav/MAX(vf-sav,le-32)*(nobs-2); 
}; /* where 'vf keeps the total sum of squares of f[]*/ 



/* the same for the ordinary AoV statistics: 
sav=0.; 

for (i=0;i<nbc;i++) sav+=(ave[i]*avc[i]/nct[i]); 
sav/=(FLOAT)ncov; 

th[ifr] = sav/(nh-l)/MAX(vf-sav,lc-32)*(nobs-nh); 

}; */ 

/* if (iflex > 0) fprintf(stderr, "aov:warning:" 

"poor phase coverage at %d frequencies/n" , iflex); */ 
return(O); 

}; 



Input: 

nobs - number of observations; 

tin[nobs], fin[nobs] - times and values of observations; 
nh, ncov - number of phase bins and number of coverages; 
nfr - number of frequencies; 
frO, frs - frequency start and step; 



Parameter: 

TIME - extended precision type, 



e.g. #defme TIME double 

MAXBIN - maximum (nh+l)*ncov; 
MAXOBS - maximum number of observations; 
CTMIN - minimum bin occupancy >1 
e.g. #defme CTMIN 5 
Output: 

th[nfr] - the AoV periodogram; 

The complete source code and a test example may be downloaded from the web address http://www.camk.edu.p1/~alex/#software 



